An Anytime Approximation for Optimizing Policies Under Uncertainty
نویسنده
چکیده
The paper presents a scheme for approximation for the task of maximizing the expected utility over a set of policies, that is, a set of possible ways of reacting to observations about an uncertain state of the world. The scheme which is based on the mini-bucket idea for approximating variable elimination algorithms, is parameterized, allowing a exible control between e ciency and accuracy. Furthermore, since the scheme outputs a bound on its accuracy, it allows an anytime scheme that can terminate once a desired level of accuracy is achieved. The presented scheme should be viewed as a guiding framework for approximation that can be improved in a variety of ways.
منابع مشابه
An Anytime Algorithm for Decision Making under Uncertainty
We present an anytime algorithm which computes policies for decision problems represented as multi-stage influence diagrams. Our algorithm constructs policies incrementally, starting from a policy which makes no use of the available information. The incremental process constructs policies which includes more of the information available to the decision maker at each step. While the process conv...
متن کاملOn-Line Search for Solving Markov Decision Processes via Heuristic Sampling
Abstract. In the past, Markov Decision Processes (MDPs) have become a standard for solving problems of sequential decision under uncertainty. The usual request in this framework is the computation of an optimal policy that defines the optimal action for every state of the system. For complex MDPs, exact computation of optimal policies is often untractable. Several approaches have been developed...
متن کاملDESPOT: Online POMDP Planning with Regularization
POMDPs provide a principled framework for planning under uncertainty, but are computationally intractable, due to the “curse of dimensionality” and the “curse of history”. This paper presents an online search algorithm that alleviates these difficulties by focusing on a set of sampled scenarios. The execution of all policies on the sampled scenarios is captured in a Determinized Sparse Partiall...
متن کاملRobust Online Optimization of Reward-Uncertain MDPs
Imprecise-reward Markov decision processes (IRMDPs) are MDPs in which the reward function is only partially specified (e.g., by some elicitation process). Recent work using minimax regret to solve IRMDPs has shown, despite their theoretical intractability, how the set of policies that are nondominated w.r.t. reward uncertainty can be exploited to accelerate regret computation. However, the numb...
متن کاملEnhancing the Anytime Behaviour of Mixed CSP-Based Planning
An algorithm with the anytime property has an approximate solution always available; and the longer the algorithm runs, the better the solution becomes. Anytime planning is important in domains such as aerospace, where time for reasoning is limited and a viable (if suboptimal) course of action must be always available. In this paper we study anytime solving of a planning problem under uncertain...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000